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CO \ Abstract 

(*~^ In this paper, we consider using total variation minimization to recover signals whose gradients have 

. a sparse support, from a small number of measurements. We establish the proof for the performance 

guarantee of total variation (TV) minimization in recovering one-dimensional signal with sparse gradient 
support. This partially answers the open problem of proving the fidelity of total variation minimization 
in such a setting [5D]. In particular, we have shown that the recoverable gradient sparsity can grow 
linearly with the signal dimension when TV minimization is used. Recoverable sparsity thresholds of TV 
minimization are explicitly computed for 1-dimensional signal by using the Grassmann angle framework. 
We also extend our results to TV minimization for multidimensional signals. 

1 Introduction 

q ■ Compressed sensing has recently gained a lot of attention in many applications including medical imaging, 

because it enables acquiring sparse signals from a much smaller number of samples than the ambient dimen- 
sion of signal. Compressed sensing takes advantage of the fact that most signals of interest in practice are 
sparse: there are only a few nonzero or big elements when the signals are represented over a certain dictionary 
such as wavelet basis. For these types of sparse or compressible signals, compressed sensing theory [51 ITU] 
ON I has established that a small number of nonadaptive measurements are often sufficient to efficiently recover 

■ them under methods such as t\ minimization [4ll6lll0j. 

Without of loss of generality, let us assume that x <S R is a one-dimensional (compared with 2- 
dimcnsional images and 3-dimensional videos) signal vector of N elements, and has no more than K (K <C N) 
nonzero elements. In compressed sensing, we sample x using M {M < N) linear projections 
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V = Ax, 

where A is an M x N measurement matrix and y is an M x 1 measurement result vector. Knowing A and 
the measurement result y, l\ minimization is often used to recover the sparse x: 

min||£c||i subject to y = Ax. (1) 

X 

It has been shown that under suitable conditions on the measurement matrix A, it is guaranteed that 
the original x is the unique solution to l\ minimization (JTJ. In fact, if A satisfies the so-called restricted 
isometry property (RIP), then the solution of ((TJ) matches exactly with the original signal [5JE1EI]. Various 
results concerning the perfect reconstruction of the original signal by solving ([T]) have been established 

in [SJISJiMIllIZllMllSn]- 

The results above hold true only for sparse signals, and they can be extended to signals that are synthe- 
sized by a linear combination of few atoms in a (redundant) dictionary with incoherent atoms }21| . However, 
there are numerous practical examples in which a signal of interest does not fall into the category where the 
aforementioned theory work. One such an example is signal that has a sparse gradient (i.e., the signal is 
piecewise constant), which arises frequently from imaging. Images with little detail are usually modelled as 
piecewise constant functions. For simplicity, we assume that x € 1* is a vector generated from 1-dimensional 
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picccwise constant signal. Let Dx be its finite difference defined by [Dx]i = x,+i — x, for i = 1, 2, . . . , N — 1 . 
Since x is piecewise constant, we must have that Dx is sparse. Assume that Dx has only K (K -C iV) 
nonzero entries. Let y = Ax £ R M be M linear samples of a;. Then, to recover x, one usually solves 

min||Da;||i subject to y = Ax. (2) 

X 

The rcgularization term ||Da;||i is called the total variation (TV) of x. When x £ M N is generated from 
d-dimcnsional signals, we only need to replace D by the concatenation of directional finite differences, and 
||Da;||i is the anisotropic TV of x. 

TV rcgularization has been used extensively in the literature for decades in imaging sciences [1.18. 23 , 24 
and other related fields [7ll26j. The minimization problem ([2]) has the same form as the minimization in the 
analysis-based compressed sensing in |3]. However, the perfect reconstruction result in [3] can not be applied 
to ©, as the rows of D do not form a frame (D has a nontrivial null space). Despite the great importance 
of the TV minimization in applications, rigorous proofs of conditions of successfully recovering signal by 
using the TV minimization have only recently been established fT9"ll2"0] . To establish such conditions, [T91I2TJ] 
first transformed d-dimensional (d > 2) signals with sparse gradients into signals compressible over the Haar 
orthogonal wavelet basis. Then a modified restricted isometry condition, which takes into account the Haar 
orthogonal wavelet transformation, was established for the matrix A such that ([2]) offers a stable recovery 
of x. However, it is noted in [HIEQ] that establishing conditions for successfully recovering 1-dimensional 
(namely d=l) signal vector remains an open problem. This is partially due to the fact that small TV of a 
1-dimensional signal does not necessarily imply fast decay of its Haar wavelet coefficients. 

In this paper, we establish the proof for performance guarantees of TV minimization in recovering 1- 
dimensional signal with sparse gradient support. This partially answers the open problem of proving the 
fidelity of total variation minimization in such a setting [20 . Compared with 19 ,20 , our results do not use the 
restricted isometry condition, but instead directly work on the null space condition of the measurement matrix 
A. To establish the null space condition of interest, we use "Escape through the Mesh" theorem [TB1[2^1[23] 
to estimate the Gaussian width jl6l[22] of a cone specified by the null space condition. We then use the 
Grassmann angle framework to calculate the thresholds on gradient sparsity such that TV minimization ([2]) 
can recover with high probability. We further extend our results to TV minimization for higher dimensional 
signals. For d > 2, we have obtained performance bounds for TV minimization comparable to results 
in [191120]. In [5], an average-case phase transition was calculated for TV minimization through evaluating 
the asymptotic minimax Mean Square Error (MSE) of TV minimization. Compared with [§] , our results are 
more concerned with the worst-case performance guarantees which are uniformly true for all the possible 
supports for the signal gradient. 

The rest of this paper is organized as follows. In Section [2] we establish the performance guarantee of 
TV minimization for 1-dimcnsional signal vector. Our proof is based on a null space condition introduced in 
Section 12.11 and two different arguments are respectively presented in Section 12.21 using the escape through 
the mesh theorem and in Section [2.31 via Grassman angle framework. In Section [3] we extend our results to 
TV minimization for multidimensional signals. Section|4]concludes our paper and discusses future directions. 

2 One-dimensional Signals 

In this section, we establish the main result of this paper on the performance guarantee of TV minimization 
in recovering one-dimensional signal with sparse gradient support. Throughout this section, we will assume 
that x £ M. N is generated from a one-dimensional signal and Dx contains at most K nonzeros. We assume 
that the entries of A £ M. MxN are randomly drawn from i.i.d. Gaussian distribution. We give two different 
arguments on the proofs, namely, the one using "Escape through the Mesh" theorem [16,22,25 in Section |2~21 
and the Grassmann angle framework in Section [273] There two arguments will lead to two different recovery 
threshold bounds for minimal M . Both the arguments are based on a null space property of the matrix A, 
which are presented in Section 12.11 



2 



2.1 The Null Space Condition for Successful Recovery via the TV Minimization 

In this section, we give a condition on the null space of the linear projection matrix A, such that TV mini- 
mization successfully recovers one-dimensional signals with sparse gradients. We remark that this condition 
is not new, and it has appeared in the proofs in [T51I20] . 

Theorem 2.1. Assume A E R MxN and y = Ax. Then x is the unique solution to ^ for all x whose 
gradients Dx have no more than K nonzero elements ( no matter what the support K, of Dx is ) if and only 
if the following condition holds: for every nonzero vector z in the null space of A (namely Az = 0, z 7^ Oj, 

\{Dz) K \^<\{p%) KB \ x V/C C {1, 2, . . . , TV - 1} s.t. \JC\ < K. (3) 

We omit the detailed proof of this theorem, since it is very similar to the proof of null space conditions 
for £i minimization; see, for example, |25U30) . 

2.2 Recovery Thresholds via Escape through the Mesh Theorem 

In this subsection, we prove that a measurement matrix A whose elements are i.i.d. Gaussian random 
variables satisfies the null space condition in Theorem 12.11 with high probability, as long as 

M > C(NK) 1/2 In TV, 

where C > is a constant. This shows that, for any M that is proportionally growing with TV, the signal 
gradient sparsity K that the TV minimization is guaranteed to recover will also grow proportionally with 
TV. Our proof builds on the following "Escape through the Mesh" theorem. 

Theorem 2.2 (Escape through the mesh [IB]). Let S be a subset of the unit Euclidean sphere E> N_1 in Mr. 
Let Y be a random (TV — M)- dimensional subspace of M. N r distributed uniformly in the Grassmanian with 
respect to the Haar measure. Define the Gaussian width for the set S as w{S)—E(swp w( z S {h T w)) , where h 
is a random column vector in R w with i.i.d. TV"(0, 1) Gaussian elements. Assume that w(S) < (\M tt?)- 

2 v AI 

Then 

P(Yf]S = 0) > 1 - 3.5e . 

If the elements of the measurement matrix A are i.i.d. Gaussian random variables, then the null space 
of A is a random (TV — M)-dimensional subspace distributed uniformly in the Grassmanian with respect to 
the Haar measure (see [25]). To prove the null space condition in Theorem 12.11 holds with high probability, 
we show that the Gaussian width w(S) is in the order of \[M for the set 

S = {x : 11x112 = 1, and 3tC C {1, 2, . . . , TV} s.t. \fC\ < K, \\ {Dx) K \\ x > || [Dx) K c j| J. 

For any x G S and a set /C that satisfy ||(J5x)x;||i > ||(Z)x)x:<:||i, we have that 

IK-D^MIi < HODaOdli < y/K\\(Dx) K h < ^K\\Dx\\ 2 < 2VK\\x\\ 2 = 2VF. 
This implies ||Dx||i < 4\/K and therefore and further 

ScS:={x : ||x|| 2 <l, \\Dx\\ x < aVk] . 

In the following, we estimate the Gaussian width of S. We only consider the case that TV = 2 L , and the 
proof the other cases are essentially the same and does not change only the order the the Gaussian width. 
For any x £ S, we decompose x according to Haar wavelet transform as 

x = z« + ... + i< L > +y {L \ (4) 
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where 



and 

y (i) -y (L) ®[i i ... i], y (L) = [ yi L) ]. 

Here <£> is the Kroncckcr product, i.e., a®b := [a\b a^b . . . a n b]. The decomposition @ is done recursively 
as follows. We first decompose x = y 1 - 1 ^ + where 



= ® [i l], = ^ • ■ • !®J. ^ = 3aM - 1 2 +{Ba< 

and 



*W = 2 (1) ® [1 -1] 

Then, we further decompose 



where 

„(!) ,„W 

*<« = ® [1 1 1 1], y< 2 > = [y[ 2) yf ] . . . yf fi ], y?> = y -^±±3i , 

and 

„(i) _„(i) 

i< 2 > = ,( 2 > ® [i 1 - 1 - 1], *( 2 ) = [*p> 4 2) • ■ • 4 2) = ^ y " . 

Generally, at level £, we have that 

y W=yW®\l_^, yW = b,W^ ... y^] 

we decompose it as 
where 

y (^) = y (m) Q _ JJ> y( w) = z c* + d . . . y (m) +i]> y f +D = >i^__y^L 

2 l+l 



and 



,{1+1) = Z (t + D an.! _i . . . _ 11 z (^d = [2 f +D . . . +1 ) = j/2 '- 1 ^ , 



2 4 ' 2* 

The decomposition (0| has the following properties. 

• Obviously, components in decomposition Q are orthogonal to each others. Consequently, 

Nil = + U {2) \\1 + • • • + P (i) ll2 + \\y (L) \\i = ^ ffn.wn^ x ^"-" 2 
2 



| y W|| 2 = ^(2^||^)|| 2 ) + 2 i !| y | 

Since a; € 5 implies ||x||2 < 1, we have 

E(2*||*W||2)+2 i ||»W||3<l. (5) 
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• It can be shown that 
and, therefore, 



\\Dy {l) h < \\Dy 



\\z W \\i < ||D^ _1) ||i/2 < 2\fK. 
Indeed, let u be satisfying = (u, Dy^) and ||u||oo < lj an d we then have 



N/2 e -l N/2 e -l / 



' (i-i) (i-i) (i-i) ' 

V2i+2 ' V2i+l V2i "+" V2t-\ 



N/2 e -l 

E 

i=\ 
N/2 e -l 

E 



■V2i+2 g2i+l / (£-1) i «2» ygi-1 

O + l2/2i+l 2^2i J + o 



' (£-1) _ (£-1)' 
^2i+2 «2i+l 



N/2 e -l 



N/2 t -l 

E 



2/2i »2i-l 



"2* / (/-l) _ (£-1) 

2 V 272 2/1 



WjV-2* / (£-1) 



2 

AT/2 f -l 

E 



■(^-yS:?)) 



u i2 e + u (i-l )2 l („,{t-l) „,(^"1) N 

2 

7V/2 f -l 



[y N /2'-i ~ Vn/2^-1 ) + \ U ^ e ' ^2i+2 - Vlli+1 ) 



(H,Dy(*-% 



where 



Since 



u = 



n W2f n n U 22* + u 2" n n ^ + M 2 2* r 
U u 2 t U U u 22 f 1*32* 



0^0 



u i2 e + u {i-l)2>- 



< 



\Ut2l\ + K*-l)2'| 



we have Hull™ < Hull™ < 1. This leads to 



(6) 



= (u,Dy^) = (u,DyV-V) < \\Dy 

Now we are ready to estimate the Gaussian width of 5. Let g be a vector whose entries are i.i.d. Gaussian 
random variables with mean and variance 1. Since © implies \\z^\\ 2 < -4==, we have, by Cauchy-Schwartz 

inequality, ||z W ||i < J§\\z^\\ 2 < This together with © implies that 



1 2 111 _• mm 



TV 
"2^ 



2Vif 



Then, 
Here 

aW = 



(zW,g) = (zV\gW)<\\zW\\ 1 \\gWl 



y^Cfli ~ 9i+2 1 - 1 ) Y J {9i+2 t - 9i+2>+2 c - 1 ) ■■■ y^Adi+N-2 1 - 9i+N-2'+2'- 1 ) 



9\ l > 92 l> ■ ■ ■ 9%*} 
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In the following, we estimate E(\\g^> ||oo). Notice that the components in are i.i.d. random variables 
that follow 7V(0, 2 e ). The following argument follows from Lemma 4.4 of Rudelson and Vershynin's paper [22] . 
Let p be a large enough number that is determined later. Then 



E 



(ll^lloo) < E M £ | 5 f |^ < (iV/20 1/P (jS (bf |*)) 



i/p 



r( P /2 + i/2) 



< V¥(N/2 e ) 1/p (2 p/2 ^^ 



/2) 

1/2 



1//' 



<V^ (TV/20 VP (^) + 



Choose p = 2 In ^ JV ^. 2 ^ , and we obtain 

E (||g W ||oo) < ^(p+l) 172 = V2?y/l + 2hi(N/2') = V¥yj2 In 

Therefore, 



£ 8Up(z',fl> <S SUpll^llillfll 









j < min | 













< min < 

This together with (|4]) implies that 

Eg sup(a;,g) < 
Vxe5 / 



,,2^2^! • v /21n(e 1 /2AT/2^). 

\/f + E min | V? > 2 ^2^} • V 2 ln ( el/2 N ^ 



(7) 



Now we estimate the constant J2e=i mm 

{ y/f , 2V^} • ^/21n(e 1 /27V/2^). Let L o be the maximum integer 



that satisfies > 2\/2 L «K, which leads to 2 L ° < \\j Since Lq is the maximum integer, we have 

lyf§ < 2 Lo+1 . It is obviously that min {y/$ , 2^/¥k} = 2V¥k if I < L and min {yf$, 2V¥k} = 
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otherwise. Therefore, if TV > 1 and K > 1, then 




(8) 



Finally, we get 



E ( sup (a:, g) I = (4V2 + 4)(NK) 1/A J 2 In (e^N) . 

Since the Gaussian width in Theorem [22] is of the order vM, we have 

M~ (JViO 1/2 In TV. 

2.3 Recovery Thresholds via the Grassmann Angle Framework 

In previous subsections, we have used the "Escape through the Mesh" theorem to establish performance 
guarantees of TV minimization for signal recovery. In this subsection, we explore the Grassmann angle 
framework [28j to characterize performance guarantees of TV minimization for 1-dimensional signal vectors. 
The upshot here is that the Grassmann angle framework gives explicitly computable thresholds on recoverable 
sparsity level K, when the number of measurements is proportionally growing with the signal dimension V. 

Let us use /C to denote the set of indices i's such that \xi+\ — Xi\ is one of the K terms on the left side 
of the inequality 12.11 Let us denote the set of indices (i's and (i + l)'s ) involved in these K terms as VIC. 
We note that the cardinality [DK\ of VIC is at most 2K. 

Then there exist at least (N — 1 — 3K) terms in the form of |a^+i — Xi\ that do not involve any index 
in VIC. Among these (N — 1 — 3K) terms, we can at least choose N ^ 1 i ^ 3K terms such that each of them 
involves different indices from 2?/C, and from each other. Let us use K.B to denote the set of indices i's such 
that \xi+i — Xi\ is one of these N ~ 1 ~ 3K terms. By the triangle inequality, 

/] \xi+i -Xi\<2 2J \xi\ 
ieic ieVK 

Then one sufficient condition for TV minimization to work is 

2 J2 W ^ J2 (9) 

ievK. ieiCB 
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Figure 1: Recoverable thresholds on sparsity of gradient support for TV minimization from the Grassmann 
angle framework [28] 

holds for every vector x in the null space of the projection A. We call this condition RelaxedNU LL condition. 

Since we are taking the projection A uniformly over all the m-dimensional subspaces in R n , the probability 
that RelaxedNU LL condition holds, is equivalent to the probability that 

2 J2 \xi\<V2 n ( 10 ) 

ie{i,2,...,|OTC|} »g{|T>/c|+i,...,|p/q + JV - 1 - 3 ' K } 

holds for every vector x in the null space of a uniform distributed M'-dimensional projection A' in ijl^K-H 
where M' = \VK\+ W ~ 1 ~ 3A ' - (N-M). This is because the null space of a uniform M -dimensional subspaces 
in R n can be represented as {x : x = Hz, z G R N ~ M }, where H is an N x (N — M) matrix whose elements 
are i.i.d. Gaussian random variables Af(0, 1). With Hi denoting the l-th row of H, Hi + \ — Hi is just a row 
vector with elements being i.i.d Gaussian random variables A/"(0, 2). Noting Xi+\ — xi =< Hi + \ — Hi, z >, we 
can just think of \/2 multiple of an element of a vector in a uniform M'-dimensional subspace 

Now our problem reduces to determining for what values of K, with high probability the RelaxedNU LL 
condition ((9]) holds simultaneously for every gradient support set K. (which determines T>JC and ICB). This 
falls exactly into the Grassmann angle framework [TT1I13I128] which can compute such K using the Grassmann 
angle tools from high dimensional convex polytope theory. For details, the reader can refer to [28], with the 
corresponding parameter C in [28] set as \f2. Figure [1] plots the recoverable threshold ^ as a function of 
the compression ration ^ as N oo. 

2.4 Gradient Sparsity K Growing Linearly with Signal Dimension N 

In this part, we consider the regime of interest where the sparsity of the signal gradient grows linearly 
with the problem dimension. The main result is summarized in the following theorem, showing that TV 
minimization can allow the gradient sparsity K to grow proportionally with signal dimension N . 

Theorem 2.3. Suppose that the measurement matrix A is an M x N matrix having i.i.d. standard zero 
mean Gaussian elements. For any constant < a < 1, there exists a constant 5 > such that the following 
statement holds true, with overwhelming probability as M — >• oo ; N oo, and — > a. 

For all subsets K. C {1, 2, JV — 1} with cardinality \JC\ < SN , and for every nonzero vector x in the null 
space of A (namely Ax = 0. x ^ 0), 

WiDxfchKWiDxfch, (11) 

where tC c = {1, 2, N-l}\ K. 

To prove Theorem 12. 31 we first prove a uniform lower bound for the TV norm in Subsection 12.4.11 and 
then utilize the lower bound to arrive at the conclusion in Subsection 12.4.21 
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2.4.1 Uniform Lower Bound for Total Variation Norm 

We consider the (N — M)-dimensional null space of the measurement matrix A. Recall that A has i.i.d. 
standard zero mean Gaussian elements. Equivalently, a basis for the null space of A can be represented by an 
N x (N — M) matrix H with i.i.d. standard zero mean Gaussian elements. To prove the null space property 
for successful signal recovery using TV minimization, we only need to prove the null space property holds 
for those vectors Hz, where z G M. N ~ M with ||;z|| 2 = 1. 
To this end, we first establish the following claim. 

Theorem 2.4. With high probability as N — > oo, uniformly for every x = Hz with z G R Ar ~ M and \\z\\2 = 1, 

||(Daj)||i >jn, 

where 7 > is a sufficiently small positive constant. 

We divide the proof into three parts. In the first part ( Subsection 12.4. lTTj) . we establish an upper bound 
uniformly true for every x = Hz with z G M iv_M and ||^||2 = 1- In the second part fSubsection 12.4.01) . 
assuming that a certain deviation bound holds true for the Total Variation norm (to be proven in Subsection 
12.4.1.30 . we establish Theorem 12.41 using the technique of e-net. In the third part (Subsection 12.4. Q|) . we 
prove the needed deviation bound for Total Variation norm. 



2.4.1.1 Upper Bound for the Total Variation First of all, with high probability, as N — > 00, for 

every x = Hz with z G R N ~ M and ||z|| 2 = 1, 

||x|| 2 < CtVN, 

where C\ is a constant as N — > 00. We have used the deviation bound for the largest singular value of 
matrices with i.i.d. Gaussian elements [15] . 
Following this fact, we know 



I (Da) ||i < 2||x||i < 2ViV||x|| 2 < 2CiN. 



2.4.1.2 Uniform Lower Bound on Total Variation through the e-Net We cover the sphere 
{z\ \\z\\2 = 1} with e-net, where e = C7, C > and 7 > are constants we will choose later, e-net is a 
finite set V = {vi,...,vl} on {z\ \\z\\2 = 1} such that every point z from {z\ \\z\\2 = 1}, there is a vi G V 
such that \\z — V1W2 < e. The size of the e-net can be taken no bigger than (1 + 2jiv— m _ 
From Subsection 12.4. lT3l we know that for every x = Hz generated by points from e-net, 

||(Dx)||i>7JV, 

where 7 > is a sufficiently small constant. 

For any z such that ||z||2 = 1, there exists a point vq (we change the subscript numbering for V to index 
the order) in V such that \\z — V0W2 — ei < e. Let z\ denote z — vq, then ||zi — ei^i || 2 = e 2 < eie < e 2 for 
some v\ in V. Repeating this process, we have z = X)j>o e i u i' w here £o = lj e j < ^ arL d Vj G V. 

Then 

Y^\(D(Hz))i\ = ^DQ^ejHvM 

i i j > 

% j>i 

> ^iw.I-^^id^.),! 

i j > 1 i 

> 7 iV — x 2CiJV, 

1 — e 
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where the first inequality follows from the triangle inequality, and the last inequality follows from the upper 
bound on the TV norm in Subsection 12.4.1.11 

We have just shown that, for for every x = Hz with z £ R N ~ J/ and ||z||2 = 1, 

HCDzOHi >jN — x 2C X N. 

1 — e 

For any arbitrary positive constant j3 > 0, we can always take C > to be a sufficiently small constant (this 
does not affect the proof and conclusion in Subsection 12.4.1.3]) . such that 

||(DaO||i>(l-0)7iV 

holds true for every every x = Hz with z £ M. N ~ M and ||z||2 = 1- 

2.4.1.3 Proving the deviation bound In this subsection, we prove that, for a constant C > 0, a 
sufficiently small constant 7 > 0, and e = C7, for every x = Hz generated by points from e-net, 

||(Das)||i> 7 JV J 

with overwhelming probability as N — > 00. 

We claim, it is sufficient to prove, for a vector x with i.i.d. zero mean standard Gaussian random variables, 
as N — > 00, with probability e _A '( lo s(~)+ c ' 2 ) ; where C2 > is a constant, 

||0Dsc)||i<7-W- 

In fact, we recall that the size of the e-net is at most (1 + -§-) N ~ M , and notice that, for any point 
z £ M. N ~ M and ||z||2 = 1, the elements of Hz are i.i.d. standard zero mean Gaussian random variables. By 
a simple union bound, with probability at most 

there exist some point x = Hz with z from the e-net, such that 

||CDsc)||i<7-W- 

No matter what C we are looking at, if we take 7 > sufficiently small, this probability P converges to 0, 
as N — s- 00, M —> 00, and — > a, where a is a constant. This means that with overwhelming probability, 
for all points from the e-net, 

||px)|| 1 >7iV. 

This leads to the conclusion in Subsection 12.4.1.21 that, with overwhelming probability, for every x = Hz 
with z G R N - M and ||z|| 2 = 1, 

\\(Dx)\\i > {l-0ftN. 

Now we focus on proving the following theorem about a sequence of i.i.d. zero mean Gaussian random 
variables of unit variance. 

Theorem 2.5. Suppose X\, x% t and xn are N independent random variables following the standard 
Gaussian distribution Af(0, 1). Then for all sufficiently small 7 > 0, the probability 

N-l 

P(J2 \Xi+l - x t \ < 7 N) < e -(^)N(^)+C 2+ o(l))^ 

1=1 

where both fi > and C2 are constants, independent of 7 and N ; o(l) goes to zero as N — > 00; moreover, 
/.t > can be made arbitrarily small. 
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Proof. Let us define ti = |ar»_j_i — Xi\, 1 < i < iV — 1. Suppose that 

||(P>x)||i<7^ 

then among A" terms in ||(£)cc)||i, there must be at most ^ terms which are larger than T 7 in magnitude, 
where T is any arbitrary positive number (say, T — 1000); namely, there must be at least (1 — ip)N terms 
that are no bigger than T 7 . 

Let A4i denote the event that ti > T7, and let M.\ denote the complementary event that ti < Tj. We 
further define the indicator function 7j, 1 < i < N, as 

{0 if Mi happens, 
1 if M? happens. 

And let S be the set of i's such that ti < T7; and S c = {1, 2, A" — 1} \ S be the complement of S. 
Then the probability that Mj happens for and only for i G 5 is 



JV-l 

p(/ 1 ,j 2 ,...,/ jv -i)= n ^i-fi^2,...,/i-i 



When Ii = 0, we simply upper bound P(p|p, I2, by 1; when Jj = 1, we claim that P(p|p, 12, h-i) 
is upper bounded by -^=Tj. In fact, in the Gaussian Markov chain Xi+i — Xi, no matter what values Xi,X2, 
...Xi—i take, the probability of having a magnitude \xi — Xi-\\ no larger than than T7 is maximized when 
Xi-i is equal to 0. This leads to 

P(I U I 2 ,...,I N ^) < (-i=T 7 )l s l. 

V Z7T 

Since > (1 - ±,)N, we have 

p(/ 1 ,/ 2 ,...,/ w _ 1 )<(^t 7 )( 1 -t)^. 

We have at most (^si" 1 ) possibility for the set S with cardinality So the probability 

W_1 N ~ 1 /AT -A 1 

i=(i-*)" V J J V2n 

From Stirling's formula and notice the j = (1 — y)A r is the biggest term in the upper bound of Pi, when 
7 is sufficient small such that (— i=T 7 ) is smaller than 1 — y, we have 

logCPO/AT (p(I) + (1 - I) log(-Lr 7 )) + o(l), 

where P(p) = plog(-) + (1 — p) log(j^r) is the entropy function, and o(l) is a term that goes to as AT — > 00. 

Since we can pic£ an arbitrarily big constant T, we have the theorem statement by simply taking ji = 
i □ 



We remark that Theorem 12.51 eventually leads to the proof of Theorem 12.4 

2.4.2 Upper Bound on the Partial Total Variation Norm 

In this section, we prove the following theorem. 
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Theorem 2.6. Suppose a matrix H is an N x (N — M) matrix having i.i.d. standard zero mean Gaussian 
elements. For any constant < a < 1 and any positive constant 7 > ; there exists a constant 6 > such 
that the following statement holds true, with overwhelming probability as M — > 00, N — > 00, and — > a. 

For all subsets JC C {1, 2, N — 1} with cardinality \JC\ < 8N , and for every x ~ Hz with z £ K. N ~ M 
and \\z\\2 = 1, 

||(DaOjc||i < \lN. (12) 

We notice that ||(Z)a;)x;||i < 2 ^ where M. is the set of indices j's that Xj is involved in the 

expression ||(Da;)x:||i- Because of this, and the fact that the cardinality < 2\JC\, we can use the same 
methodology in |27j to prove Theorcm l2.6[ based on the uniform lower bound result we have from Theorem 
EH 



3 Extension to Multidimensional signals 

In this section, we extend our results to rf-dimcnsional (d > 2, for example d = 2 for image and d = 3 for 
videos) signal vectors. Wc get results that arc comparable to those in [HOHO]- m particular, let X £ R Nd 
be a multi- indexed vector that is from a c?-dimensional signal. Let A £ W MxN be a measurement matrix 
whose elements are i.i.d. Gaussian random variables, and Y = AX be its corresponding measurements of 
X. Define DX be the discrete gradient of X. Assume that DX contains at most K nonzero entries. In 
order to recover X, similar to @, we solve the following minimization 

nun ||£>X||i, subject to Y = AX. (13) 



In the remaining of this section, we prove that the unique solution of (|13[) is exactly the original X with 
high probability, as long as 

, r (dKlogjNlnN if d = 2 
~ [C 2 KlnN if d>2. 

where C\ > and C2 > are two constants depending on d. Note that ||DX||i in p^|) is the anisotropic 
TV. Our proof can be generalized to isotropic TV without too much difficulty 

Similar to Theorem 12.11 a sufficient condition for the original X being the unique solution of (|13l) is 
the null space condition ([3]). Different from 1-dimensional case, this null space condition is only a sufficient 
condition for higher dimensional signals. Then, using the escape through the mesh theorem, this null space 
condition holds true with high probability if the Gaussian width satisfies w(Sd) < \fM — 2 ^ M , where 

S d = {X£R Nd : ||X|| a = l, and || (DX) K \\ 1 > !l (DX) K . \\ x 3fC C {1, . . . , A^} d x {1, . . . , d} s.t. \K\ < K}. 
Given any vector X £ Sd, we have 

\\DX\\i = ||(DX)x:c||i + IKDJQjcIIi < 2||(DX)jc||i < 2Vk\\{DX) k \\ 2 
< 2VK\\DX\\ 2 < 4VdVK\\X\\ 2 < iVdVK. 

Wc have used the fact that ||DX|| 2 < 2Vd\\X\\ 2 . Therefore, 

S d cS d :={X £R N " : ||X|| 2 < 1, \\DX\\i < 4VdVK}. 

In the following, we estimate the Gaussian width of S d - Similar to 1-dimensional signal, we consider 
only the case where N — 2 L . For any X £ S d , we decompose X according to Haar wavelet transform for 
d-dimensional vector as 

X = E E ZW+Y( L K (14) 
t=i ie{o,i} d \o 
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where 

Z(Li) = Z (t,t) g H (i) l2£ _ i; Z (M) G R W2Y 

and 

Here 1„ G M n<1 is the d-dimensional vector whose entries are all 1, and ® is the Kronecker product, i.e., 
(g> B is the block <i-dimensional matrix whose {31,32, ■ ■ ■ ,jd) block is Aj 1 j 2 ...j d B. Moreover, J?W g J£ 2 
with i = (ii, 22, ... , id) is the (scaled) Haar filter defined by 

»:,.,,. {['>;: ■ -th = [i i], fc w = [i -i]. 

/c=i 

In particular, we have if' ) = I2. 

The decomposition (jT^J) is done recursively as follows. We first decompose X := Y^ = Y^- 1 ' + 
Ei S {o,i} d \o^ (M) > where 

v(i)_ v (i) ffl1 ^(1) _ T,je{osV H j° )x 2k-j _ T,je{o,iy X ^k-j 

and ^ _ 

= Z ^ ® = Ejg{o ' i}d 2 y )X2fc ' j '. 

One can check that Y"(°) = + X)ie{o i} d \o % l ^ ■ Furthermore, it can be easily shown that this decom- 
position is an orthogonal decomposition. Then, we further decompose 

Y^=Y^+ Yl Z (2A) ( 15 ) 
ie{o,i} d \o 

where 

frWyW y(l) 
y(2) = y(2) ~ -. y(2) _ ^jg{Q : l}rf -"j J 2fc-j = Z^jg{0,l} d J 2fc-j 

and 

Again, one can check (|T51) holds true and is an orthogonal decomposition. Generally, at level i, we have that 
and we decompose it as 

y«=y( < + 1 )+ ]T (16) 

iG{0,l} d \O 

where 

y(^+i) = y(^+i) a 1 Y {1+1) = ±^j£tjW_j 2fc ~J = ^je{04} d 2fc -J 

and 

£(/+M) = Z (/+1,<) 8 8 ^ Z (<+M) = ^jg{04}^ ^-j 
The decomposition ([Tl]) has the following properties. 
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• Obviously, components in decomposition (fT4"|) arc orthogonal to each others. Consequently, 

\\ x \\ 2 2 = J2 E ii^ (f,i) ii2 + n^ (L) !i2 = E \ 2de E ) +2 dL \\Y (L) \\i 

1=1 ie{o,i} d \o i=i \ ie{o,i} d \o / 

Since X 6 S implies ||-X"|| 2 < L we have 

L 



Y,\2 M Yl 11^112 ] +2 dL \\Y^f 2 <l 
ie{o,i} d \o 



c=i 



• It can be shown that 
and, consequently, 



\DY(% K) <\\DX\\l K) <iVdVK. 



(17) 



(18) 



(19) 



Let Di be the difference matrix along the i-th dimension. Then, similar to the 1-D case, one can show 
that < 2 d " x ■ ^WD.Y^-V^ = Summing over i yields (fig). 

• Furthermore, for any vector G, we have 



ie{o,i} d \o 



where G^ -1 - 1 S R^/ 2 * is a d-dimensional signal whose i-th entry is the sum of the entries of G 
on the i-th block of size 2^ _1 x 2 . For simplicity, we prove it for d = 2. The remaining case can be 
shown analogously. When d = 2, wc have the four filters are 



#(0,0) 



1 1 
1 1 



(1,0) 



1 -1 
1 -1 



#(0,1) 



1 1 

-1 -1 



1 -1 
-1 1 



Let D\ and £> 2 be finite difference along the horizontal and vertical direction respectively. Then it is 
easy to check that 

||L>(a 00 if ® l 2 *-i + ai -H- (1,0) ® la*-i + a i# (04) ® l 2 *-i + a xl H {1 ^ ® l 2 ,-i)||i 
= ||L>i(a oif (0 ' 0) ® l 2 «-i + aioi? (1 ' 0) ® l a *-i + aoi-H" (0,1) ® l 2 ,-i + a lx H {1 ^ ® l 2 i-i)||i 

+ j|L> 2 (a 00 if (0 ' 0) ® l a <-i + aio# (1 ' 0) ® la«-i + aoii^ ' 1 ) ® lai-i + aiii? (M) ® l 2 *-i)||i 
= 2 _1 (|a i + an | + |ooi — an| + |oio + on| + |oio - an|). 
Therefore, 

2 ^-i ( || Z (A0i) +z (Aii)|| 1 + || Z (f,oi) _ ^.uJUj + || Z (*,io) + ^.n)^ + ||Z^ 10 > - Z^\\ x ) 

< + z^ 10 ) + z^ + z^w, = \\Dt^\u < 4V2VF. 



-1) 

oe , 



Furthermore, if we let G DO be a down sample of G^ ^ on odd-odd indices and similarly G 
Geo~ , and Gee _1 \ then 

(G, Z^< 1Q ) + Z^ 01 ' + Z^ 11 )) = (G^-i) )Z (Aio) ff(i.o) + z (/,oi) g, #(0,1) + z (/, 1 i) g, #(1,1)) 
HG^" 1 ', Z^ 01 ) + Z^ 10 > + Z^ 11 ') + (G^- 1 ), -Z( £ < 01 > + Z^ 10 > - Z^ 11 )) 
+ (G^, Z^ 01 > - Z^ 10 > - Z«- n >) + (G^T 1 ), — Z^' 01 ) - Z( £ ' 10 > + Z^' 11 )) 

^HG^IUHZ^ 01 ) + z( £ - 10 > + z^ 11 )]^ + IIG^-^iun - z( £ - 01 > + z^ 10 > - z^\\ t 
+ iig^-^iuiiz^ 01 ) - z^ 10 > - z& u % + iiGg-^ui - z^ 01 > - z^ 10 > + z^ u )|i a 
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Since Z^ 01 ) = ±((Z^ 01 ) + Z^ 11 )) + (Z^ 01 ) - Z^ 11 ))), Z^ 10 ) = i((Z^ 10 ) + Z^ 11 )) + (Z^ 10 ) - 
Z^ 11 ))), and Z^ 11 ) = ±((Z^> 01 ) + Z^ 11 )) - (Z( £ ' 01 ) - Z^ 11 ))), we have 

|| Z (/,oi) + z (Aio) + Z (iM)^ < I(||Z^ 01 ) + Z^ n )||i + \\Z^ - Z^\\ x ) 

+ i(||Z(°°) + Z^^Hi + ||Z^ 10 ) - Z^Hx) + i(||Z^ 01 ) + Z^Hx + ||Z^ 01 > - Z^' u )||i) 



< 



3 4^^ 



2 

and similarly, 

|| Z (*,oi) _ z (^io) _ z^\\ x , || - Z^ 01 ) - Z^ 10 > + Z^ n > || l5 1| - Z( £ < 01 > + Z^ 10 > - Z^\\x < 



3 4y/2^/K 
2 2'" 1 



Therefore, 



< ^^^IIG^IU = 8 • • a^llG^H 



2«- 



Now we are ready to estimate the Gaussian width of Sd- Let G be a vector whose entries are i.i.d. 
Gaussian random variables with mean and variance 1. The same argument in one dimensional cases leads 
to 



£(||G W ||oo) < J 2^21n (eV*N d /2 a 



which implies 



E 



^G{0,l} d \O 



< 8Vd(2 d - 1) 

< 8Vd(2 d - 1) 
<8xQ(2 d -l) 



K 



2( e ~ 


i)(d- 


i) 




JK 




2( e ~ 


i)(d- 


i) 




/K 




2( £ - 


■i)(d- 


i) 



^/2^- 1 )21n(e 1 /2ATrf/2rf(^i)) 



2^-1)2 In (e 1 ' 2 N d ) 



8Vd(2 d - l)y/K2Q-W-toyj2\D. (e 1 / 2 ^). 



Moreover, 



e((G,Y (l ^ = e(y^G^ < |y (i) |£(j|G (i) ||oo) < ^ ■^ E ^2 dL 2]n.(e 1 / 2 N d /2 dL ) = ^3. 
Therefore, 



L-l 



e((g,x)) = J2e\ J2 (g,z^)]+e({g,yM) 

yiG{0.1} d \0 



i=i 



L-l 



8Vd(2 d - l)VKJ21n (e^N d ) £ 2^- 1 ^ 1 "^ + V3 



|8\/d(2 d - l)\/Fy / 21n(eV2i\^log 2 iV + A/3 if d = 2, 
\8Vd(2 d - 1)VK^2 In (e 1/2 N d ) + y/Z if d > 2 



< 



We require the Gaussian width is about yM, where M is the number of measurement. So, we have 

2 



M 



Klog^NhxN if d = 2 



/v In N 



if d > 2. 
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4 Conclusion 



In this paper,we establish the proof for the performance guarantee of total variation (TV) minimization in 
recovering one- dimensional signal with sparse gradient support. This partially answers the open problem 
of proving the fidelity of total variation minimization in such a setting |20j . We also extend our results 
to TV minimization for multidimensional signals. Recoverable sparsity thresholds of TV minimization are 
explicitly computed for 1-dimensional signal by using the Grassmann angle framework. 

Our current results work only for the Gaussian ensemble of measurement matrices. One future direction 
is to extend our results to general deterministic and random measurement matrices, such as partial Fourier 
matrices, and random Bernoulli matrices. Another direction we would like to pursue is to tighten our bounds 
for 1-dimcnsional signal vector. For multidimensional signals, we conjecture that for Gaussian measurement 
operators, when the number of measurements is proportional to the problem dimension N , the recoverable 
sparsity of gradient support, by the TV minimization, can also grow proportionally with N d . We are also 
interested in working towards tightening our results in this direction. The almost Euclidean property of 
subspaces [TTK^niEO] can be further used to extend our results to proving the stability of TV minimization 
for signals with approximately sparse gradients, under noisy measurements. 
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